The freshness of web search engine databases

نویسندگان

  • Dirk Lewandowski
  • Henry Wahlig
  • Gunnar Meyer-Bautor
چکیده

This study measures the frequency in which search engines update their indices. Therefore, 38 websites that are updated on a daily basis were analysed within a time-span of six weeks. The analysed search engines were Google, Yahoo and MSN. We find that Google performs best overall with the most pages updated on a daily basis, but only MSN is able to update all pages within a time-span of less than 20 days. Both other engines have outliers that are quite older. In terms of indexing patterns, we find different approaches at the different engines: While MSN shows clear update patterns, Google shows some outliers and the update process of the Yahoo index seems to be quite chaotic. Implications are that the quality of different search engine indices varies and not only one engine should be used when searching for current content.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Internet Search Engine Freshness by Web Server Help

We study how to keep the Internet search engines up-todate with the changes occurring at the various web servers in the Internet. Currently, web search engines poll the web servers on a per-URL basis for obtaining update information. We advocate an approach in which web servers themselves track the changes happening to their content files for propagating updates to search engines. We propose an...

متن کامل

Towards a Content-Provider-Friendly Web Page Crawler

Search engine quality is impacted by two factors: the quality of the ranking/matching algorithm used and the freshness of the search engine’s index, which maintains a “snapshot” of the Web. Web crawlers capture web pages and refresh the index, but this is always a never-ending quest, as web pages get updated frequently (and thus have to be re-crawled). Knowing when to re-crawl a web page is fun...

متن کامل

Temporal Ranking of Search Engine Results

Existing search engines contain the picture of the Web from the past and their ranking algorithms are based on data crawled some time ago. However, a user requires not only relevant but also fresh information. We have developed a method for adjusting the ranking of search engine results from the point of view of page freshness and relevance. It uses an algorithm that postprocesses search engine...

متن کامل

A New Hybrid Method for Web Pages Ranking in Search Engines

There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...

متن کامل

Towards Serving "Delicious" Information within Its Freshness Date

Like freshness date of food, Web information also has its “shelf life”. In this paper, we exploratively study the reflection of shelf life of information in browsing behaviors. Our analysis shows that the satisfaction of browsing behavior is modified if the shelf life of information could be considered by search engines.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Information Science

دوره 32  شماره 

صفحات  -

تاریخ انتشار 2006